Posts Tagged: Data Science

February 16, 2022
January 28, 2022

๐Ÿ“Š Datasets poem

Originally posted on twitter.

Titanic is tiring ๐Ÿšข

Iris is irritating ๐Ÿฅ€

MNIST is too easy ๐Ÿฅฑ

Boston makes me queasy ๐Ÿคข

California housing is not so bad ๐Ÿก

Sentiment analysis just makes me sad ๐Ÿฅฒ

Here are the datasets that I gravitate to... ๐Ÿงต

What about you? ๐Ÿ™Œ

January 18, 2022
January 17, 2022

Criticizing Models

Originally posted on twitter.

Criticizing your models is an important part of modeling.

In statistics this is well recognized. We check things like heteroskedasticity to avoid drawing the wrong conclusions.

What do you do in machine learning? Only check cross-validation score?

January 12, 2022

๐Ÿ“Š Three types of metrics for data science projects in professional organizations

If you're working on a data science project in a professional organization you'll need to show the value you contribute. That's where metrics come in.

But choosing a metric is hard. Because there's usually multiple factors at play. I've encountered that in my daily work as a data scientist.

I've found it useful to organize metrics in three level framework. I learned about it in the book "Trustworthy Online Controlled Experiments" (Kohavi, Tang, and Xu, 2020).

I'll explain the three types of metrics below.

January 10, 2022

Why you should care about Data-Centric AI

If you care about delivering value with data science you should probably care about Data-Centric AI.

Data-Centric AI is about iterating on the data instead of the model architecture to create good machine learning models.

Read the full thread on Twitter where I explain why this approach matters for practical applications.

October 15, 2021

Winner of the Data-Centric AI Competition

Proud to announce that my team was one of the winners in Andrew Ng's Data-Centric AI Competition! This competition focused on improving model performance by enhancing the dataset rather than changing the model itself.

To learn more about our approach and the data-centric techniques we used, check out our detailed blog post where we share the three key tips that helped us succeed.

We also shared a blogpost on DeepLearning.AI about our experience with the competition.

August 24, 2021

Three tips for Data-Centric AI and one data science lesson

This blog was originally posted at Xebia.com, my employer at the time of writing.

Andrew Ng (co-founder of Coursera, Google Brain, deeplearning.ai, landing.ai) is most famous for his Machine Learning course on Coursera. It teaches the basics of machine learning, how to create models and how to use them to predict with great accuracy.

Recently, he has introduced the concept of Data-Centric AI. The idea is that rather than treating your dataset as fixed and solely focus on improving your model setup, it focuses on improving your dataset. He argues that this is often much more effective to improve your performance.

May 31, 2021

Catching chatbots by the long tail

This blog was originally posted at Xebia.com, my employer at the time of writing.

Have you ever had a conversation with a chatbot? Was it a positive experience? It might have been. But more likely than not it left you a bit frustrated. Chatbots try to understand your message and help you with an appropriate response. However, most of the time they're not that great yet. In a way chatbots are like baseball players.

"Baseball is the only field of endeavor where a man can succeed three times out of ten and be considered a good performer." โ€” Ted Williams, baseball Hall of Famer

The same holds true for chatbots. A deflection percentage of 32% [users helped by the bot without human intervention] is what google considers a success story!

Customer Story on Google Dialogflow's website. Retrieved 20 April 2021

As a data science consultant I've worked with multiple companies on chatbots and helped them do better. During these projects I have discovered a pattern that might help others build better chatbots too. In this article I outline three tips that should help you focus on what matters.

Catching chatbots by the long tail

Shell sage is an amazing tool that lets you chat with an LLM inside your terminal. It automatically loads in your shell history into the context window so the LLM knows what you've been doing. Combine with with Ipython and you've got an interactive python environment to pair in with the LLM. Add ipdb and you've got an AI enabled debugger.

February 07, 2020

Rhyme with AI

This blog was originally posted at Xebia.com, my employer at the time of writing.

Machines may take over the world within the year;
But creating rhymes instills in us the most fear!
Luckily, pre-trained neural networks are easy to apply.
With great pride we introduce our new assistant: Rhyme with AI.